Probabilistic Models of Word Order and Syntactic Discontinuity
نویسنده
چکیده
This thesis takes up the problem of syntactic comprehension, or parsing—how an agent (human or machine) with knowledge of a specific language goes about inferring the hierarchical structural relationships underlying a surface string in the language. I take the position that probabilistic models of combining evidential information are cognitively plausible and practically useful for syntactic comprehension. In particular, the thesis applies probablistic methods in investigating the relationship between word order and psycholinguistic models of comprehension; and in the practical problems of accuracy and efficiency in parsing sentences with syntactic discontinuity. On the psychological side, the thesis proposes a theory of expectation-based processing difficulty as a consequence of probabilistic syntactic disambiguation: the ease of processing a word during comprehension is determined primarily by the degree to which that word is expected. I identify a class of syntactic phenomena, associated primarily with verb-final clause order, where the predictions of expectation-based processing diverge most sharply from more established locality-based theories of processing difficulty. Using existing probabilistic parsing algorithms and syntactically annotated data sources, I show that the expectation-based theory matches a range of established experimental psycholinguistic results better than locality-based theories. The comparison of probabilisticand locality-driven processing theories is a crucial area of psycholinguistic research due to its implications for the relationship between linguistic production and comprehension, and more generally for theories of modularity in cognitive science. The thesis also takes up the problem of probabilistic models for discontinuous constituency, when phrases do not consist of continuous substrings of a sentence.
منابع مشابه
یک مدل موضوعی احتمالاتی مبتنی بر روابط محلّی واژگان در پنجرههای همپوشان
A probabilistic topic model assumes that documents are generated through a process involving topics and then tries to reverse this process, given the documents and extract topics. A topic is usually assumed to be a distribution over words. LDA is one of the first and most popular topic models introduced so far. In the document generation process assumed by LDA, each document is a distribution o...
متن کاملEarly Syntactic Bootstrapping in an Incremental Memory-Limited Word Learner
It has been suggested that early human word learning occurs across learning situations and is bootstrapped by syntactic regularities such as word order. Simulation results from ideal learners and models assuming prior access to structured syntactic and semantic representations suggest that it is possible to jointly acquire word order and meanings and that learning is improved as each language c...
متن کاملStudying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملProbabilistic models of similarity in syntactic context
This paper investigates novel methods for incorporating syntactic information in probabilistic latent variable models of lexical choice and contextual similarity. The resulting models capture the effects of context on the interpretation of a word and in particular its effect on the appropriateness of replacing that word with a potentially related one. Evaluating our techniques on two datasets, ...
متن کاملUsing Probabilistic-Risky Programming Models in Identifying Optimized Pattern of Cultivation under Risk Conditions (Case Study: Shoshtar Region)
Using Telser and Kataoka models of probabilistic-risky mathematical programming, the present research is to determine the optimized pattern of cultivating the agricultural products of Shoshtar region under risky conditions. In order to consider the risk in the mentioned models, time period of agricultural years 1996-1997 till 2004-2005 was taken into account. Results from Telser and Kataoka mod...
متن کامل